- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.4k
[skip-ci][NFC][ntuple] add schema evolution docs #20079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a8d2a1f    to
    1da82a2      
    Compare
  
    1da82a2    to
    586ad7c      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Note for classes that have been given a version number, users must increase the class version number whenever the persistent layout changes (members marked as transient do not participate in the persistent layout). | 
In addition there is a quirk that we currently also need to increase the version number of a derived class if the base class changes. It is a really annoying feature and we should 'fix' it and thus it may or may not be worth mentioned it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? The TClass will be confused and issue an error/warning message if there is 2 StreamerInfo with the same version number but different checksum.
In the subset of cases where the StreamerInfo for the class stored in RNTuple is not stored in the file, then the above might be true as then the RNTuple layout is then clearly known to be its own but having the same version number marked in the RNTuple and the in memory layout will make the use of rules 'harder'.
Note that this is distinct from the case where the class is not (user) versioned at all. In this case each schema layout is distinct and look up is based on the CheckSum value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may or may not deserve as asterix. I am guess the bound check is based on the underlying type rather than on the list of 'valid'/'explicit' enums values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does we (eventually) want a (possibly optional) bound check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we discussed a check based on the floating-point class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a mention of the cases with fixed length collections (both vs other fixed length collections and variable collections). For example are the following supported?:
| In-memory type | Compatible on-disk types | 
|---|---|
| int[8] | int[4] | 
| std::vector<int> | int[4] | 
| std::array<int,4> | int[4] | 
And vice et versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: std::array -> variable-length collection is already mentioned (but not the reverse).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this mention (as a preview of the text) below whether this include user types related via a renaming rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::list and std::deque are not explicitly listed, are they supported here?  (If they are not actively disallowed then they will likely just 'work' via the 'User-defined collection' code path (i.e. they have a CollectionProxy)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither are supported in RNTuple at the moment: https://github.com/root-project/root/blob/master/tree/ntuple/doc/BinaryFormatSpecification.md#stdlib-types-and-collections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we actively do 'something' explicit about it, the std::list (and likely std::deque) should be storable via the CollectionProxy.   (I.e. technically they would be handled here in the User-defined collection)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's indeed a problem since we only have the check in RClassField:
root/tree/ntuple/src/RFieldMeta.cxx
Lines 134 to 136 in fe2834d
| if (fClass->Property() & kIsDefinedInStd) { | |
| throw RException(R__FAIL(std::string(GetTypeName()) + " is not supported")); | |
| } | 
root [0] ROOT::RFieldBase::Create("l", "std::list<int>").Unwrap()
(std::unique_ptr<ROOT::RFieldBase, std::default_delete<ROOT::RFieldBase> >) std::unique_ptr -> 0x3737380
root [1] typeid(*ROOT::RFieldBase::Create("l", "std::list<int>").Unwrap().get()).name()
(const char *) "N4ROOT23RProxiedCollectionFieldE"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might deserve a note/comment/asterix hinting at the reason why other transformation can not be supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor)
| For the exact syntax of customization rules, please refer to the ROOT manual. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention that if the given type differs from the on file type, the source member will undergo schema evolution before being passed to the rule's function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A source member can be read them into any type compatible to its on-disk type
What is the (them) part refering to?
| | Layout Change | Also supported in Untyped Records | Comment | | ||
| | --------------------------------------- | --------------------------------- | -------------------- | | ||
| | Remove member | Yes | Match by member name | | ||
| | Add member | Yes | Match by member name | | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we specify in the comment that new members are default initialized?
| | | enum | with bounds check | | ||
| |-----------------------------|-----------------------------|----------------------| | ||
| | `std::[u]int[8,16,32,64]_t` | `bool` | | | ||
| | | `char` | | | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should specify the behavior when the value doesn't fit in the char
| |-----------------------------|-----------------------------|----------------------| | ||
| | float | double | | | ||
| |-----------------------------|-----------------------------|----------------------| | ||
| | double | float | | | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No bounds check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this line of the table, it's not needed because double can store a superset of values that can be represented by float on-disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, this is the "read" table - then I guess I meant the line above
| ROOT I/O customization rules allow for custom code handling the transformation | ||
| from the on-disk schema to the in-memory model. | ||
| Customization rules are part of the class dictionary. | ||
| For the exact syntax of customization rules, we refer to the ROOT manual. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a link to the manual here?
| - Target members of the target class, i.e. those class members whose value is set by the rule. | ||
| Target members must be direct members, i.e. not part of a base class. | ||
| - A source class (possibly having a different class name than the target class) | ||
| together with class versions or class checksums | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be useful to mention that the class checksum can be retrieved from TClass::GetCheckSum()
| - Source members of the source class; the given source members will be read as the given type. | ||
| Source members can also be from a base class. | ||
| Note that there is no way to specify a base class member that has the same name as a member in the derived class. | ||
| - The custom code snippet; the code snippet has access to the (whole) target object and to the given source members. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an example of a customization rule would be very useful to anyone not already familiar with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we discussed a check based on the floating-point class.
| |-----------------------------|-----------------------------|----------------------| | ||
| | float | double | | | ||
| |-----------------------------|-----------------------------|----------------------| | ||
| | double | float | | | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this line of the table, it's not needed because double can store a superset of values that can be represented by float on-disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither are supported in RNTuple at the moment: https://github.com/root-project/root/blob/master/tree/ntuple/doc/BinaryFormatSpecification.md#stdlib-types-and-collections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | `ROOT::RVec<T>` | `std::vector<T'>` | with size check | | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector and / or ROOT::RVec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector and / or ROOT::RVec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now listed in the table, do we need / want to discuss it separately?
No description provided.